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Abstract 

XML database query languages have been studied extensively, but 
XML database updates have received relatively little attention, and 
pose many challenges to language design. We are developing an 
XML update language called FLUX, which stands for FunctionaL 
Updates for XML, drawing upon ideas from functional program- 
ming languages. In prior work, we have introduced a core language 
for Flux with a clear operational semantics and a sound, decidable 
static type system based on regular expression types. 

Our initial proposal had several limitations. First, it lacked sup- 
port for recursive types or update procedures. Second, although a 
high-level source language can easily be translated to the core lan- 
guage, it is difficult to propagate meaningful type errors from the 
core language back to the source. Third, certain updates are well- 
formed yet contain path errors, or "dead" subexpressions which 
never do any useful work. It would be useful to detect path errors, 
since they often represent errors or optimization opportunities. 

In this paper, we address all three limitations. Specifically, we 
present an improved, sound type system that handles recursion. We 
also formalize a source update language and give a translation to 
the core language that preserves and reflects typability. We also 
develop a path-error analysis (a form of dead-code analysis) for 
updates. 

Categories and Subject Descriptors D.3.1 [Programming Lan- 
guages]: Formal Definitions and Theory; H.2.3 [Database man- 
agement systems} : Languages — data manipulation languages 

General Terms Languages 

Keywords XML, update languages, type systems, static analysis 

1. Introduction 

XQuery is a World Wide Web Consortium (W3C) standard, typed, 
purely functional language intended as a high-level interface for 
querying XML databases. It is meant to play a role for XML 
databases analogous to that played by SQL for relational databases. 
The operati onal semantics and type system of XQuery 1.0 has been 
formalized dPraper et 31120070 , and the W3C recently endorsed the 
formal semantics as a recommendation, the most mature phase for 
W3C standards. 

Almost all useful databases change over time. The SQL stan- 
dard describes a data manipulation language (DML), or, more 
briefly, update language, which facilitates the most common pat- 
terns of changes to relational databases: insertion, deletion, and 
in-place modification of rows, as well as addition or deletion of 
columns or tables. Despite the effectful nature of these operations, 
their semantics is still relatively clear and high-level. SQL up- 
dates are relatively inexpressive, but they are considered sufficient 
for most situations, as witnessed by the fact that in many SQL 
databases, data can only be updated using SQL updates in trans- 
actions. Moreover, the presence of SQL updates does no damage 



to the purely functional nature of SQL queries: updates are syn- 
tactically distinct from queries, and the language design and trans- 
actional mechanisms ensure that aliasing difficulties cannot arise, 
even when an update changes the structure of the database (for ex- 
ample, if a column is added or removed from a table). 

The XQuery standard lacks update language features analogous 
to SQL's DML. While XML querying has been the subject of a 
massive research and development effort, high-level XML update 
languages have received comparatively little attention. Many pro- 
gramming languages for transforming immutable X ML trees hav e 
been studied, including XML stylesheets (XSLT dClarkl 119991) ), 
and XML programming lang uages such as XDuce, CDuce, Xtatic , 
or OCamlDu ce ( Hosova and Pierc^ 120031 : iBenzaken et"!!] 120031 : 
iGapevev et al. 2006; Frisch 2006). However, these languages are 
not well-suited to specifying updates. Updates typically change 
a small part of the document and leave most of the data fixed. 
To simulate this behavior by transforming immutable XML val- 
ues one must explicitly describe how the transformation preserves 
unchanged parts of the the input. Such transformations are typi- 
cally executed by building a new version of the document and then 
replacing the old one. This is inefficient when most of the data 
is unchanged. Worse, XML databases may employ auxiliary data 
structures (such as indices) or invariants (such as validity or key 
constraints) which need to be maintained when an update occurs, 
and updating a database by deleting its old version and loading a 
new version forces indices and constraints to be re-evaluated for 
the whole database, rather than incrementally. 

Instead, therefore, several languages specifically tailored for up- 
dating XML data in-place have been proposed. While the earliest 
proposal, called XUpdate (Laux and Martin 2000), was relatively 
simple and has been widely adopted, it lacks first-class conditional 
and looping constructs. These features have been incorporated 
into more recent p r oposals (jj atarinov et al. 2001; Sur et al. 20041; 
iGheUi et alj 12006 ; IChamberUn et al.. .2006 ; .Ghelli et al.. .2007b.) . 
Th e W3C is also develop ing a standard XQuery Update Facil- 
ity ( IChamberUn et alj|2008h . 

Although they have some advantages over XUpdate, we argue 
that these approaches all have significant drawbacks, because they 
unwisely combine imperative update operations with XQuery's 
purely-functional query expressions. We shall focus our critique 
on XQuery!, since it is representative of several other proposals, 
including the W3C's XQuery Update Facility. 

A defining principle of XQuery! is that update operations 
should be "fully compositional", which iGhelli et aU ([2006) take 
to mean that an update operation should be allowed anywhere in 
an XQuery expression. Thus, the atomic update operations such 
as insertion, deletion, replacement, renaming, etc. may all appear 
within XQuery's query expressions. Node-identifiers can be used 
as mutable references. To avoid nondeterminism, XQuery! fixes a 
left-to-right evaluation order and employs a two-phase semantics 
that first collects updates into a pending update list by evaluating 
an expression without altering the data, and then performs all of 




Figure 1. XQuery! examples 



the updates at once. An additional operator called snap provides 
programmer control over when to apply pending updates. 

XQuery! seems to sacrifice most of the good properties of 
XQuery. Most equational laws for XQuery expressions are invalid 
for XQuery!, and the semantics is highly sensitive to arbitrary 
choices. For example, consider the following XQuery! update. 



for $x in $doc//a, 
$y in $doc//b 
return (insert $y into 



$x, delete $x//c) 



Its behavior on two trees is shown in Figure[T] In the first example, 
consider input tree (a) with regular structure. Running the above 
update on this tree yields the update sequence: 



insert ( 1 , <b><d/><b>) 
insert (1 , <b><e/><b>) 
insert (2 , <b><d/><b>) 
insert (2 , <b><e/><b>) 



delete (4) 
delete (4) 
delete (6) 
delete (6) 



The numbers refer to the node identifiers shown in Figure [TJa) 
as superscripts. When these updates are performed, the result is 
output (b). Note that each subtree labeled a in the output contains 
three 6-subtrees, one corresponding to the original b and one for 
each occurrence of b in the tree. In the second example, tree (c) is 
transformed to (d) via updates 

insert (1, <b><c/><b>) ; delete (3); 

insert (l,<b><a><c/><a/><b/>) ; delete (3) ; 

insert (5 , <b><c/><b>) ; delete (6) ; 

insert (5 , <b><a><c/><a/><b/>) ; delete (6) ; 

Observe that both occurrences of a have as subtrees both occur- 
rences of b in the input. This is because the snapshot semantics of 
XQuery! first collects the updates to be performed, then performs 
them in sequence. Inserts always copy data from the original ver- 
sion of the document, whereas deletes mark nodes in the new ver- 
sion of the document for deletion. This is why some occurrences 
of c remain below occurrences of a. Although this is not an update 
that a user would typically write, an implementation, type system, 
or static analysis must handle all of the expressions in the language, 
not just the well-behaved ones. 

Furthermore, the XQuery! approach seems quite difficult to 
statically typecheck. There are several reasons for this. First, as 
the examples in Figure [T] show, the structure of the result can 
depend on the data in ways difficult to predict using types. Second, 
XQuery! also permits side-effects to be made visible before the end 
of an update, using a "snapshot" operator called snap. The snap 



operator forces all of the delayed side-effects of an expression to 
be performed. This means that the values of variables may change 
during an update, so it would be necessary to update the types of 
variables to typecheck such updates. Since variables may alias parts 
of the document, this requires a nontrivial alias analysis. 

We argue that the combination of features considered in XQuery ! 
and similar proposals are unnecessarily complex for the problem of 
updating XML databases. While high expressiveness is certainly a 
reasonable design goal, we believe that for XML database updates, 
expressiveness must be balanced against other concerns, such as 
semantic transparency and static typechecking. We believe that it 
is worthwhile to consider an alternative approach that sacrifices 
expressiveness for semantic clarity and the ability to typecheck and 
analyze typical updates easily. 

In previous work ( iChenevl|2007h . we introduced a core Func- 
tionaL Update language for XML, called FuJXy FLtJX is func- 
tional in the same sense that imperative programming in Haskell 
using monad^ is functional. Side-effects may be present, but 
they are encapsulated using syntactic and type constraints, so that 
queries remain purely functional. FLUX provides sufficient expres- 
sive power to handle most common examples of XML database- 
style updates (e.g. all of the r elational use cases in the XQu ery Up- 
date Facility Requirements JChamberlin and Robiell2005h ). while 
avoiding complications resulting from the interaction of unre- 
stricted iteration and unconstrained side-effects. 

Flux admits a relatively simple, one-pass operational seman- 
tics, so side-effects can be performed eagerly; it can also be type- 
checked using regular ex pression t ypes, exte nding previous work 
on typecheck ing XQuery dHosova et al. 2005i : IColazzo et al.ll2006l : 
[Draper et al.l l2007). The decidability of typechecking for the core 
language (with recursive types and functions) was later established 
by Chenev ( 2008) along with a related result f or an XQuery core 
language. However, our preliminary proposal ( IChenevll2007l) had 
several limitations. This paper presents an improved design. In par- 
ticular, our contributions relative to prior work are: 

• We extend the core language with recursive types and update 
procedures and provide a sound type system. 

• We ad apt the idea of path-error analysis, introduced bv lCoIazzo et al.l 
( |2006|) for a core XML query language, to the setting of up- 
dates, and design a correct static analysis for conservatively 
under-approximating update path-errors. 

• We formalize a high-level FLUX source language and show how 
to translate to core FLUX. We also present a source-level type 
system and prove that a source update is well-formed if and 
only if its translation is well-formed. 

The structure of the rest of this paper is as follows. Sec- 
tion|2]briefly recapitulates the FLUX source language introduced in 
(jChenev 2007) with a few examples. Section[3]formalizes the core 
language and its operational semantics; its type system is presented 
and proved sound in Section |4] Section [5] presents the translation 
from the high-level FLUX language to Core FLUX and shows how 
to typecheck high-level updates. Section |6]presents the path-error 
analysis. We provide a more detailed comparison with related ap- 
proaches in Section |7j Section [8] presents extensions and future 
work; and Section|9]concludes. 

Certain definitions and proofs have been placed in appendices. 



' originally Lux, for "Lightweight Updates for XML". 

'Tec hnically, Flux 's approach to typechecking updates is closer to ar- 
rows iHughesll2OO0l) : however, we will not investigate this relationship in 
detail here. 



Stmt :■- Upd [VHERE Expr] 
I IF Expr THEN Stmt 

Stm,t; Stm,t 
I LET Var := Expr IN Stmt 
{Stmt} 
Upd :■- INSERT (BEFORE I AFTER) Pai/i VALUE fepr 

INSERT AS (LAST|FIRST) INTO Path VALUE Expr 
DELETE [FROM] Path 
RENAME Path TO Lab 
REPLACE [IN] Path WITH Expr 
UPDATE Path BY Stmt 
Path ::— . \ Lab \ node() | text() 

Path/ Path \ Var AS Path \ Path[Expr] 

Figure 2. Concrete syntax of FLUX updates. 



2. Overview and examples 

2.1 Syntax 

As with many similar languages, particularly SQL, XQuery (j Draper et alj 
12007), and CPL+ (Liefke and Davidson 1999), we will introduce a 
high-level, readable source language syntax which we will translate 
to a much simpler core language. We will later formalize the oper- 
ational semantics and type system for the core language. In what 
follows, we assume familiarity with XQuery and XPath syntax, and 
with XDuce-style regular expression types. 

The high-level syntax of FLUX updates is shown in Figure |2] 
XQuery variables Var are typically written $x, $y, etc. We omit the 
syntactic class Expr consisting of ordinary XQuery expressions, 
respectively. Statements Stmt include conditionals, let-binding, 
sequential composition, and update statements Upd, which may 
be guarded by a WHERE-clause. We use braces to parenthesize 
statements {Stmt}. Updates Upd come in two flavors, singular 
and plural. Singular updates expect a single tree and are executed 
once for each selected tree; plural updates operate on arbitrary 
sequences and are executed on the children of each selected tree. 
Singular insertions (INSERT BEFORE/ AFTER) insert a value before 
or after each node selected by the path expression, while plural 
insertions (INSERT AS FIRST/LAST INTO) insert a value at the 
beginning or end of the child-list of each selected node. Similarly, 
singular deletes (DELETE) delete individual nodes selected by the 
given Path, whereas plural deletes (DELETE FROM) delete the child- 
list of each selected node. Singular replacement REPLACE WITH 
replaces a subtree, while plural replacement REPLACE IN replaces 
the content of a path with new content. The renaming operation 
RENAME TO is always singular; it renames a subtree's label. The 
UPDATE Path BY Stmt operation is singular; it applies Stmt to 
each tree matching Path. Update procedure declarations are not 
shown but can be added easily to the source language. 

The path expressions Path in FLUX are based on the XPath 
expressions that are allowed in XQuery. Paths include the empty 
path ., sequential composition Path/ Path, the XPath child axis 
tests (textO, \Lab and nodeO), filters {Path[Expr]), and vari- 
able binding steps ( Var AS Path). The "as" path expression 
$x AS Path (not present in XPath) binds the subtree matching 
Path to $x in each iteration of a path update. We often write * 
instead of node{). We only describe the syntax of paths used to 
perform updates; arbitrary XPath or XQuery expressions may be 
used in subqueries Expr. 

Both the FLUX source language described here and the core 
language introduced later are case-insensitive with respect to key- 
words (like XQuery); however, we use uppercase for the source 
language and lowercase for the core language to prevent confusion. 



2.2 Execution model, informally 

In general, an update is evaluated as follows: The path expression 
is evaluated, yielding a focus selection, or a set of parts of the 
updatable store on which the update focuses. The WHERE-clause, 
if present, is evaluated with respect to the variables bound in the 
path and if the result is true then the corresponding basic update 
operation (insert, delete, etc.) is applied to each element of this 
set in turn. Order of evaluation is unspecified and the semantics 
is consistent with parallel evaluation of iterations. 

Unlike most other proposals, in FLUX, arbitrary XPath or 
XQuery expressions cannot be used to select foci. If this were 
allowed, it would be easy to construct examples for which the re- 
sult of an update depends on the order in which the focus selection 
is processed. For example, suppose the document is of the form 
<a><b/></a>. If the following update were allowed: 

UPDATE //* BY { DELETE a/b; RENAME * TO c > 

then the result would depend on the order in which the updates 
are applied. Two possible results are <c/> and <c><c/></c>. This 
nondeterministic behavior is difficult to typecheck. For this reason, 
we place severe restrictions on the path expressions that may be 
used to select foci. 

We identify two key properties which help to ensure that up- 
dates are deterministic and can be typechecked. First, an update 
can only modify data at or beneath its current focus. We call this 
the side-effect isolation property. For example, navigating to the 
focused value's parent and then modifying a sibling is not allowed. 
In addition, whenever we perform an iterative update traversing a 
number of nodes, we require that the result of an iterative update is 
independent of the order in which the nodes are updated. We call 
this the traversal-order independence property. 

To ensure isolation of side effects and traversal-order indepen- 
dence, it is sufficient to restrict the XPath expressions that can be 
used to select foci. Specifically, only the child axi^is allowed, and 
absolute paths starting with / cannot be used to backtrack to the 
root of the document in order to begin iterating over some other 
part. This ensures that only descendants of a given focused value 
can be selected as the new focus and that a selection contains no 
overlapping parts. Consequently, the side effects of an update are 
confined to the subtrees of its focus, and the result of an iteration is 
independent of the traversal order. This keeps the semantics deter- 
ministic and helps make typechecking feasible. 

2.3 Examples 

Suppose we start an XML database with no pre-loaded data; its 
type is db [ ( ) ] . We want to create a small database listing books and 
authors. The following FLUX updates accomplish this: 



Ul : INSERT AS LAST INTO db VALUE books [] ; 
INSERT AS LAST INTO db VALUE authors [] 

After this update, the database has type 

books [] , authors [] 

Suppose we want to load some XML data into the database. 
Since XML text is included in XQuery's expression language, we 
can just do the following: 

U2 : INSERT INTO books VALUE 

<book><author>Charles Dickens</author> 

<title>A Tale of Two Cities</title> 

<year>1858</year></book> 
<book><author>Lewis Carroll</author> 



The attribute axis can also be handled easily, but the descendant, parent, 
and sibling axes seem nontrivial to handle. 



<title>Alice in Wonderland</title> 
<year>??</year></book> ; 
INSERT INTO authors VALUE 
<author><naine>Charles Dickens</nanie> 
<born>1812</born> 
<died>1870</died></author> 
<author><naine>Lewis Carroll</naine> 
<born>1832</born> 
<died>1898</died></author> 

This results in a database with type 

books [ book [author [string] .title [string] , 

year [string] ] * ] , 
authors [ author [name [string] , born [string] , 
died [string]]* ] 

The data we initially inserted had some missing dates. We can 
fill these in as follows: 

U3 : UPDATE $x AS books/book BY 

REPLACE IN year WITH "1859" 
WHERE $x/title/text() = "A Tale of Two Cities" 
U4 : UPDATE $x AS books/book BY 

REPLACE IN year WITH "1865" 
WHERE $x/title/text() = "Alice in Wonderland" 

Note that here, we use an XQuery expression $x/name/text () for 
the WHERE-clause. Both updates leave the structure of the database 
unchanged. 

We can add an element to each book in books as follows: 

U5 : INSERT AS LAST INTO books/book 
VALUE publisher ["Grinch"] 

After U5, the books database has type 

books [ book [author [string] .title [string] , 

year [string] .publisher [string] ] * ] 

Now perhaps we want to add a co-author; for example, perhaps 
Lewis Carroll collaborated on "Alice in Wonderland" with Charles 
Dickens. This is not as easy as adding the publisher field to the end 
because we need to select a particular node to insert before or after. 
In this case we happen to know that there is only one author, so we 
can insert after that; however, this would be incorrect if there were 
multiple authors, and we would have to do something else (such as 
inserting before the title). 

U6 : UPDATE $x AS books/book BY 
INSERT AFTER author 

VALUE <author>Charles Dickens</author> 
WHERE $x/name/text() = "Alice in Wonderland" 

Now the books part of the database has the type: 

books [ book [author [string] * , title [string] , 

year [string] .publisher [string] ] * ] 

Now that some books have multiple authors, we might want to 
change the flat author lists to nested lists: 

U7 : REPLACE $x AS books/book WITH 

<book><authors>-[$x/author}-</authors> 

{$x/title}-{$x/year}-[$x/publisher}-</book> 

This visits each book and changes its structure so that the authors 
are grouped into an authors element. The resulting books subtree 
has type: 

books [ book [authors [author [string] * ] .title [string] , 
year [string] .publisher [string] ] * ] 



Suppose we later decide that the publisher field is unnecessary 
after all. We can get rid of it using the following update: 

U8 : DELETE books/book/publisher 

The books subtree in the result has type 

books [ book [authors [author [string] * ] . 

title [string] .year [string] ] * ] 

Now suppose Lewis Carroll retires and we wish to remove all 
of his books from the database. 

U9 : DELETE $x AS books/book 

WHERE $x/authors/author/text() = "Lewis Carroll" 

This update does not modify the type of the database. Finally, we 
can delete a top-level document as follows: 

UIO : DELETE authors 
2.4 Non-design goals 

There are several things that other proposals for updating XML do 
that we make no attempt to do. We believe that these design choices 
are well-motivated for Flux's intended application area, database 
updates. 

Node identity: The XQuery data model provides identifiers for 
all nodes. Many XML update proposals take node identities into 
account and can use them as to update parts of the tree "by ref- 
erence". In contrast. Flux's semantics is purely value-based. Al- 
though there are currently no examples involving node identity for 
XQuery database updates in the W3C's requirements documents 
JChamberlin and Robiell2005l) , node identity is important in other 
XML update settings such as the W3C's Document Object Model 
(DOM). We believe it is possible to adapt FLUX to a data model 
with node identity as long as the identifiers are not used as mutable 
references. 

Pattern matching; Many trans formation/query languages (e.g. 
dHosova et al. '2005'; 'Clark 1999)) an d some up date languages 
(e.g. ( ILiefke and Davidson 1999 ; Wang et alj2003l) ) allow defining 
transformations hy pattern matching, that is, matching tree patterns 
against the data. Pattern matching is very useful for XML transfor- 
mations in Web programming (e.g. converting an XML document 
into HTML), but we believe it is not as important for typical XML 
database updates. We have not considered general pattern matching 
in FLUX, in order to keep the type system and operational seman- 
tics as simple as possible. 

Side-effects in queries: Several motivating examples for XQuery! 
JGhelli et al. 2006) and XQueryP (Chamberlin et al. 2006) depend 
on the ability to perform side-effects within queries. Examples in- 
clude logging accesses to particular data or profiling or debugging 
an XQuery program. FLUX cannot be used for these applications. 
However, it is debatable whether adding side-effects to XQuery 
is the best way to support logging, profiling, or debugging for 
XQuery. 

3. Core language formalization 

The high-level update language introduced in the last section is 
convenient for users, but its operations are complex, overlapping, 
and difficult to typecheck. Just as for XQuery and many other lan- 
guages, it is more convenient to define a core language with or- 
thogonal operations whose semantics and typing rules are simple 
and transparent, and then translate the high-level language to the 
core language. |j We first review the XML data mod el, regular ex- 
pressio n types, and the /iXQ core query language of IColazzo et alj 

hooeh . 



* Such core languages are also typically easier to optimize, though we do 
not consider optimization in this paper. 



3.1 XML values and regular expression types 

Following IColazzo et alj ( 120061) ■ we distinguish between tree val- 
ues t £ Tree, which include strings «; £ S* (for some al- 
phabet E), boolean values true, false G Bool, and singleton 
trees n[v] where n G Lab is a node label; and (forest) values 
V £ Val — Tree* , which are sequences of tree values: 

Tree values t ::= n\v] 1 w I true I false 



(Forest) values 



:= <i)\t,v 



We overload the set membership symbol £ for trees and forests: 
that is,t £ V means that i is a member of v considered as a list. Two 
forest values can be concatenated by concatenating them as lists; 
abusing notation, we identify trees t with singleton forests i, () 
and write n, v' for forest concatenation. We define a comprehension 
operation on forest values as follows: 



[fix) \xeO] 
[f{x) \xet,v] 



= 

= /W,[/W|a;£«] 



a 



:= bool I string | n[rj 

:= alO \t\t' \T,r' \t* \X 



This operation takes a forest (fi, . . . ,tn) and a function f{x) 
from trees to forests and applies / to each tree ti, concatenating 
the resulting forests in order. Comprehensions sati sfy basic monad 
laws as well as some additional equations (see JFernandez et alJ 
1200 ih ). We use = for (mathematical) equality of tree or forest 
values. 

We consider a regular expression type system with structural 
subtyping, similar to those considered in several transformation and 
query languages for XML dHosova et alj2005l ; IColazzo et al.l2006l ; 
[Fernandez et al. 2001). 

Atomic types 
Sequence types 

We call types of the form a £ Atom atomic types (or some- 
times tree or singular types), and types r, ct £ Type of all other 
forms sequence types (or sometimes forest or plural types). Se- 
quence types are constructed using regular expression operations 
such as the empty sequence (), alternative choice r|r', sequential 
composition r, r' and iteration (or Kleene star) r*. Type variables 
X £ TyVar denoting recursively defined types are also allowed; 
these must be declared in signatures as discussed below. 

A value of singular type must always be a sequence of length 
one (that is, a tree, string, or boolean); plural types may have 
values of any length. There exist plural types with only values of 
length one, but which are not syntactically singular (for example 
string|bool). As usual, the + and ? quantifiers are definable as 
follows: r"'" = r, r* and r' =r|(). 

We define type definitions and signatures as follows: 

Type definitions to ::= a | | ro|ro | ro,To | tq 
Type signatures E ::— ■ \ E,tjpe X = to 

Type definitions to are types with no top-level variables (that is, 
every variable is enclosed in a n[~-] context). A signature E is 
well-formed if all type variables appearing in definitions are also 
declared in E. Given a well-formed signature E, we write E{X) 
for the definition of X. A type r denotes the set of values {tJe, 
defined as follows. 

[string] B = E* ln[T]JE = {n[v] \ v e {tJe} 

[bool]B = Bool It\t'}e = ItJeUIt'Je 



[()]i 



{()} 



{XJE = lE{X)j 



It,t'Ie = {v,v'\velTJE,v' efJE} 

Formally, [tJe is defined by a stra ightforward least fi xed point 
construction which we omit (see e.g. JHosova et alj200a) ). Hence- 
forth, we treat E as fixed and define [r] = [tJe- This se- 



mantics validates standard identities such as associativity of ',' 
([(n,r2),r3l = [ri,(r2,r3)l), unit laws ([r,()] = [r] = 
[(),r]), andidempotenceof '*' ([(r*)*] = [r*]). 

A type ri is a subtype of T2 (ri <: T2), by definition, if 
[''"il ^ [r2]. The use of regular expressions (including untagged 
unions) for XML typing poses a number of problems for subtyping 
and typ echecking w hich have been resolved in previous work on 
XDuce (Hosova et alJI2005l) . Our types are essentially the same as 
those used in XDuce, so subtyping reduces to XDuce subtyping; 
although this problem is EXPTIME-complete in general, the algo- 
rithm of iHosova et al.l (2005) is well-behaved in practice. There- 
fore, we shall not give explicit inference rules for checking or de- 
ciding subtyping, but treat it as a "black box". 

3.2 Core query language 

Because FLUX uses queries for insertion, replacement, and con- 
ditionals, we need to introduce a query language and define its 
semantics before doing the same for FLUX. In our implementa- 
tion, we use a var iant of the /^XQ core language introduced by 
IColazzo et alj ( l2006i) . which has the following syntax: 

e ::= | e, e | n[e] \ w \ x \ let a; = e in e 

true I false | if c then e else e | e ~ e 
X I S/child I e :: n I for x £ e return e 

We follow the convention in JColazzo et alj|2006l) of using x for 
variables introduced by for, which are always bound to tree values; 
ordinary variables x may be bound to any value. 

An environment is a pair of functions 7 : ( Var -^ Val) x 
[TVar ~* Tree). Abusing notation, we write 7(1) for 7ri(7)(a;) 
and 7(s) for 7r2(7)(2:); similarly, '^[x := v] and 'y[x := t] denote 
the corresponding environment updating operations. The seman- 
tics of queries is defined via the large-step operational semantics 
judgment 7 h e => u, meaning "in environment 7, expression e 
evaluates to value v". The contributions of this paper do not re- 
quire detailed understanding of the query language, so the rules are 
relegated to the appendix. We omit recursive queries but they can 
be added without difficulty. 

3.3 Core update language 

We now introduce the core FLUX update language, which includes 
statements s £ Stmt, tests cfi £ Test, and directions d £ Dir: 

s ::= skip \ s; s | if e then s else s | let x = e in s 

I insert e | delete | rename n 

I snapshot t in s | 07s | d[s] \ P{e) 

(f> ::= n \ node() | text() 

d ::= left | right | children | iter 

Here, P denotes an update procedure name. Procedures are defined 
via declarations P{x : f) : ti => T2 = s, meaning P takes 
parameters x of types r and changes a database of type ri to one 
of type T2. We collect these declarations into a set A, which we 
take to be fixed throughout the rest of the paper. Procedures may be 
recursive. 

Updates include standard constructs such as the no-op skip, 
sequential composition, conditionals, and let-binding. Recall that 
updates work by focusing on selected parts of the mutable store. 
The basic update operations include insertion insert e, which 
inserts a value provided the focus is the empty sequence; dele- 
tion delete, which deletes the focus (replacing it with the empty 
sequence); and rename n, which renames the current focused 
value (provided it is a singleton tree). The "snapshot" operation 
snapshot a:: in s binds x to the current focused value and then 
applies an update s, which may refer to x. There is no way to re- 



fjV \- s =>" v' 



7; ti h s =►" vi 7; Vi h s' =>^ t)2 7 h e => true 7; ii h Si =>^ d' 7 h e => false 7; v h S2 =>" t)' 



7; ti h skip =►" V 



■y;v\-s; s' 



7; II h if e then si else S2 =►" f' 7; ii h if e then si else S2 



7 h e => 1) 7[x := ii]; til h s =>" 112 7 h e =► v 



7; vi h let X = e in s =^'' 1)2 7; 1^ insert e =^'' v 7; i) h delete =>" () 7; n'[t)] h rename n =>° ?i[ii] 

7[x :=!)]; i;|- s =>"?;' i £ [01 7; t h s ^" I) i^M 7; ^ h s ^" »;' 

7; 1^ h snapshot x in s =>° v' 'y',t\- (f>7s =>" i) 7; t ^ 'A''^ =^" * 7! "-H 1^ children[s] =►" n[i''] 



7; h s ^° 1;' 



7; () h s =►" v' 7; ii h s =>" v^ 7; V2 h iter[s] =>" « 



'-'2 



7; »; h left [s] =►" v' ,v 7; "U h right [, 



7;ti,»;2 I- iter[s] =>" uiji^j 7; O ^ iter[s] =►" () 



P(x : t) : n =► r2 = s e A 7 h ei => s;i ■■■ 7 h e„ => ?J„ 7[zi := iii, . . . , x„ := D„]; i) h s =^" ?;' 

7;i) h P(e) ^" v' 

Figure 3. Operational semantics of updates. 



fer to the focus of an update within a fiXQ query without using 4. Type System 



snapshot. Also, snapshot is not equivalent to XQueryl's snap 
operator; snapshot binds x to an immutable value which can be 
used in s, whereas snap forces execution of pending updates in 
XQuery!. 

Updates also include tests (f)ls which allow us to examine the 
local structure of a tree value and perform an update if the structure 
matches. The node label test nls checks whether the focus is of 
the form n[v], and if so executes s, otherwise is a no-op; the 
wildcard test node()?s only checks that the value is a singleton 
tree. Similarly, text()?s tests whether the focus is a string. The ? 
operator binds tightly; for example, 4'ls; s' = {(t>?s); s' . 

Finally, updates include navigation operators that change the 
selected part of the tree and perform an update on the sub-selection. 
The left and right operators move to the left or right of a value. 
The children operator shifts focus to the children of a tree value. 
The iter operator shifts focus to all of the tree values in a forest. 

We distinguish between singular (unary) updates which apply 
to tree values and plural (multi-ary) updates which apply to se- 
quences. Tests (f>7s are always singular. The children operator 
applies a plural update to the children of a single node; the iter 
operator applies a singular update to all of the elements of a se- 
quence. Other updates can be either singular or plural in different 
situations. 

Figure [3] shows the operational semantics of Core FLUX. We 
write 7; ti h s =>° v' to indicate that given environment 7 and 
focus V, statement s updates v to value v'. The rules for tests are 
defined in terms of the following semantic interpretation of tests: 

[textOl = S* 

H = {n[v] I V G Val} 



model 



— Tree 



Note that we define the semantics entirely in terms of forest and 
tree values, without needing to define an explicit store. This would 
not be the case if we considered full XQuery, which includes node 
identity comparison operations. However, we believe our semantics 
is compatible with allowing node-identity tests in queries. 

Theorem 1 (Update determinism). Let j,v,s,vi,V2 be given such 

that 7; « h s =>° V\ and 7; u h s =>" V2. Then v\ — V2. 

Proof. Straightforward by induction on the structures of the two 
derivations. The interesting cases are those for conditionals, tests, 
and iteration, since they are the only statements that have more than 
one applicable rule. However, in each case, only matching pairs of 
rules are applicable. D 



As noted earlier, certain singular updates expect that the input value 
is a singleton (for example, children, n?s, etc.) while plural up- 
dates work for an arbitrary sequence of trees. Singular updates fail 
if applied to a sequence. Our type system should prevent such run- 
time failures. Moreover, as with all XML transformation languages, 
we often would like to ensure that when given an input tree of some 
type r, an update is guaranteed to produce an output tree of some 
other type r'. For example, updates made by non-privileged users 
are usually required to preserve the database schema. 

We define a matching relation between tree types and tests: we 
say that a < : if [a] C [0] . This is decidable using the following 
rules: 



string <: text() n[r] <: n a <: node() 

We employ a type system for queries similar to that developed 
by IColazzo et al.l 12006). We consider type environments P con- 
sisting of sets of bindings x:t of variables to types and x:a of tree 
variables to atomic types. (We never need to bind a tree variable 
to a sequence type). As usual, we assume that variables in type 
environments are distinct; this convention implicitly constrains all 
inference rules. We write [P] for the set of all environments 7 such 
that 7(3;) G [P(a;)] and ^{x) G [P(x)] for all x G dom(P) and 
X G dom(P) respectively. 

The typing judgment for queries is P h e ; r, meaning in 
type environment P, expression e has type t. The ty ping rules are 
essentially the same as those in ( IColazzo et al.ll200q) . 

The main typing judgment for updates is P \-°' {r} s {r'}, 
meaning in type environment P, an a-ary update s maps values of 
type T to type r'. Here, a G {1, *} is the arity of the update, and 
singular update judgments always have t = a atomic. In addition, 
we define auxiliary judgments P hiter {t} s {r'} for typechecking 
iterations and \-Deci A for typechecking declarations A. The rules 
for update well-formedness are shown in Figure|4l 

4.1 Discussion 

In many functional languages, and several XML update proposals, 
side-effecting operations are treated as expressions that return (). 
Thus, we could typecheck such updates as expressions of type () . 
This is straightforward provided the types of values reachable from 
the free variables in P do not change; for example, this is the case 
for ML-style references. However, if the side-effects do change the 
types of the values of variables, then P needs to be updated to take 
these changes into account. One possibility is to typecheck updates 
using a residuating judgment P h s : () | P'; here, P' is the 
updated type environment reflecting the types of the variables after 



rK{r}s{r'} 



r h" {r} s {t'} r h'^ {r'} s' {r"} 



r h" {r} skip {t} r h" {r} s; s' {r"} 

ri-e:bool ri-"{T}s{Ti} r h" {t} s' {r2} 
r h" {t} if e then s else s' {ri|T2} 
The : T 



r h* {()} insert e {r} T h" {r} delete {()} 

Their r,x:T h" {ti} s{t2} 



r h'- {™'[t]} rename n {n[T]} T h" {ri} let a:: = e in s {t2} 
r, x:t h" {t} s {t'} q <: (/) T h^ {a} s {t} 



r h" {r} snapshot x in s {t'} T h^ {q} 0?s {r} 

a^:0 rh*{T}^{r'} 



r 1-1 {a} </>?s {a} r 1-1 {n[r]} children[s] {n[T']} 

rh* {()}s{t'} rh*{()}s{r'} 

r h" {r} left[s] {r',r} T h" {r} right[s] {t, t'} 

r hjter {r} s {r'} rh°{ri}s{T^} r^ <: T2 

r h* {^} iter[s] {r'} T h" {n} s {ra} 

P(x : f ) : CTi =► (72 = s G A crj <: cri 

r h ei : t{ T;[ <: ti ■ ■ ■ T h e„ : r^ r^ <: r„ 

rh-{<}P(e){^2} 



r hiter {r} s {r'} 



r|-l{Q}s{r} ri-iter {ti}s{t2*} 



rhiter {()}s{()} r hiter {a} S {r} T hiter {r* } S {tj* } 

r hiter {n} S {t{} r hiter {t2} S {t^} 

rhiter {ti,T2}s{t[,T^} 

r hiter {ti}s{t[} F hjter {t2} S {rf,} F hiter {E(X)} S {t} 

rhiter {ti\t2} S {t[\tI^} T hiter {X} S {t} 

hijcci A X : T h* {ti} s {t2} 



^Dect ^Decl A, P{x : t) : n => T2 = S 



Figure 4. Update, iteration, and declaration well-formedness. 



update s. This approach quickly becomes complicated, especially 
if it is possible for variables to "alias", or refer to overlapping parts 
of the data. 

In Flux, we take a completely different approach to typecheck- 
ing updates. The judgment F h" {r} s {r'} assigns an update 
much richer type information that describes the type of the updat- 
able context before and after running s. The variables in F are im- 
mutable, so their types never need to be updated. 

The most unusual rules are those involving the iter, test, and 
children, left/right, and insert/rename/delete operators. 
The following example illustrates how the rules work for these 
constructs. Consider the update: 

iter [a?children [iter [6? right [insert c[]]]]] 

Intuitively, this update inserts a c after every 6 under a top-level a. 
Now consider the input type a [b[]*, c[], fo[]*],d[]. Clearly, the output 
type should be a[(6[],c[])*,c[], (6[], c[])*], d[]. To see why this is 
the case, first note that the following can be derived for any t,t' ,s: 



hi {a[r]} s {a[r']} 



Using the rule for children, we can see that it suffices to check 
that iter [&?right [insert c[]]] maps type b[]*, c[], b[]* to 
(6[], c[])*, c[], (&[], c[])*. This is also an instance of a derivable rule 

h'{b[]}s{r} 



h' {&[]*, c[],fo[]-} iter [fe?s]{T*,c[],r-} 

Hence, we now need to show only that right [insert c[]] maps 
type b[] to 6[], c[], which is immediate: 



h : 
hc[()] :c[()] 



h* {()} insert e[]{e[]} 
hi {&[]} right [insert c[]]{6[],c[]} 

4.2 Metatheory 

We take for granted the following ty pe soundness pr operty for 



ine ss pr o 
ai]( [2006' 



h* {a[r],d[]} iter [a?s] {a[r'],d[]} 



queries (this was proved for /iXQ in Colazzo et al.l ( 12006) ). 

Theorem 2 (Query soundness). IfT\-e:r and 7 € [F] then 
7 h e => u implies v £ [r]. 

The corresponding result also holds for updates, by a straight- 
forward structural induction argument (presented in the appendix): 

Theorem 3 (Update soundness). Assume \-Deci A holds. 

1. IfV h° {r} s {r'}, V e [r], and -y G [F], ihen 7; w h s ^" v' 
implies v' £ [r'J. 

2. IfV hiter {t} s {t'}, V £ [t], and 7 e [F], then 7; u h 
iter[s] =>° v' implies v' £ |t']. 

Moreover, typechecking is decidable for both fiXO and FLUX 
in the presence of the subsumption rules ( IChenevl[2008l) . 

5. Normalization 

There is a significant gap between the high-level FLUX language 
we presented in Section |2] and the core language in the previous 
section. In this section, we formalize a translation from the source 
language presented in Section |2] to Core FLUX. In XQuery, this 
kind of translation is called normalization. We define three normal- 
ization functions called pa?/j expression normalization [—]p^ti^{ — ), 
update statement normalization [ — J^^^j, and simple update nor- 
malization [ — ] y ^. These functions are defined in Figure[5] 

Path expression normalization takes an extra parameter, which 
must be a core FLUX update; that is, [p]p„j;j(s) normalizes a 
path p by expanding it to an expression which navigates to p 
and then does s. Compound statement normalization is straight- 
forward. Simple updates are first normalized by translation to 
p. We omit the cases needed to handle WHERE-clauses; how- 
ever, they can be handled by the existing translation if we con- 
sider e.g. REPLACE p WITH e WHERE c to be an abbreviation for 
REPLACE p[c] WITH e, etc. In particular, note that the translation 
places both c and e into the scope of all variables declared in p. 

Since the translation rules cover all cases and are orthogonal, it 
is straightforward to see that the normalization functions are total 
functions from the source language to Core FLUX. 

5.1 Typechecking source updates 

Normalization complicates type-error reporting, since we cannot 
always easily explain why the translation of an update fails to 
typecheck in source-level terms familiar to the user. We therefore 
also develop a type system for the source language that is both 
sound and complete with respect to core FLUX typechecking. This 
type system can therefore be used to report type errors to users in 
terms of the source language. We assume that query subexpressions 
e have already been normalized to /xXQ according to the standard 



[lFeTHENs]5,„, 
[LET X = e IN s]^,^, 

l-iPathi^) 

b/p']patfe(«) 

i't'ipathi^) 

bH]paife(«) 



if e then [sj^j^j else skip 
let x = e in [sj^j^^j 



[INSERT BEFORE p VALUE e] [,p^ 
[INSERT AFTER p VALUE e] [,p^ 
[INSERT AS LAST INTO p VALUE e] f,^^ 
[INSERT AS FIRST INTO p VALUE e] ^^^ 
[DELETE p]yp^ 
s [DELETE FROM p]j;p^ 

[p]pom(b'lpa«.(^)) [RENAME p TO n]^^, 

children[iter[fli?s]] [REPLACE p WITH e] j^^^ 

[P] Path (if ^ **i®^ * ^1^® ^l^ip) [REPLACE IN p WITH e]^^^ 

\.P\path (snapshot x in s) [UPDATE p BY s] ^^^ 



[Plpoth (left [insert e]) 
blpoth (right [insert e]) 
[p]p^^^(children[left[insert e]]) 
b]poth(=hildren[right [insert e]]) 

b]pafh(<ieiete) 

blpotfe (children [delete] ) 

[P]patft(i'eiiaine?i) 

[p]p^j^ (delete; insert e) 

Mpatft(children[delete; insert e]) 

[P]patft([*lsimt) 



Figure 5. Source update normalization 



rh 



■Stmt {r} s {t'} 



r hstmt {t} Si {t'} r hstmt {t'} S2 {t"} 

r hstmt {t} si;s2 {r"} 

The: bool F hsunt {a} s {t} 

r\-stmt {r}lFeTHENs{r|r'} 

r h e : To r,x : r+o hstmt {t} s {t'} 

r \-st,nt {t} let X = e in s {r'} 

r hupd {a} u {a'} 



rh 



Stint {a} u {a} 



Figure 6. Typechecking rules for compound updates 



XQuery normalization rules JDraper et alj|2007h . The problem of 
typechecking unnormalized XQuery expressions is an orthogonal 
issue (and one that has to our knowledge not been addressed). 

Typechecking source-level updates is challenging because sim- 
ple updates may change the types of many parts of the document 
simultaneously, depending on the structure of the path p. In con- 
trast, core Flux updates are easy to typecheck because they break 
the corresponding navigation, selection, and modification of types 
into small, manageable steps. 

To deal with the non-local nature of source updates, we employ 
type variables Z and context-tagged type substitutions Q. These 
substitutions are defined as follows: 



6: — 



Q,Z ^{Vt>T) 



We distinguish the type variables Z we will use here for typecheck- 
ing source updates from the type variables X used in recursive type 
definitions E\ we refer to the latter as defined type variables. We 
require the bindings ^ in O to be unique. 

We often treat substitutions O as sets or finite maps and in 
particular write B W O' for the context-tagged substitution resulting 
from taking the union of the bindings in O and O', provided their 
domains are disjoint. We also write r(0) for the result of replacing 
each occurrence of an undefined type variable in r with its binding 
in O. Moreover, we write O(O') for the result of applying B' to 
each type in O, again ignoring contexts. Substitution application 
ignores the contexts F; they are only used to typecheck updates 
within the scope of a path. We consider the free type variables of O 
to be the free variables of F and r in the bindings F > r. 

To typecheck a simple update such as DELETE a/b against an 
atomic type such as a[6[], c[]], we proceed as follows: 

1 . First match p against the input type, and split the type a of the 
document into a pair (a', O) such that a = ci'(O). 

For example, a[6[],c[]] = a[Z,cW\{Z ^ ■>&[])■ 



2. Next modify Q according to the update operation to obtain Q' . 
For this we use the Core FLUX type system to update each 
binding in B. This is only a convenience. 

Continuing the example, for a deletion we update Z h^ ■ > &[] 
io Z ^ ■> O. 

3. Finally, apply Q to a to get the desired final type after the 
update. 

For example, applying a[Z,c[]]{Z h^ ■ t> ()) we get a[(),c[]], 
as desired (this is equivalent to a[c[]]). 

Figures|7]and[8]show selected typechecking judgments for sim- 
ple updates and paths. We introduce auxiliary judgments such as 
path filter checking (F h r :: ^* (r',0), shown in Figure|9j, 
simultaneous core statement checking (h^ {0} s {O'}), and si- 
multaneous path checking Q-path {0} p -^ (6', O"). The 

For many of the typechecking judgments, we also need to 
typecheck an expression against all of the bindings of a context- 
tagged substitution O. We therefore introduce several simulta- 
neous typechecking judgments. The full system, including the 
(straightforward) compound statement typechecking judgment 
r \-sttnt {a} s {a'} and all auxiliary judgments, is shown in 
the appendix. 

The simple update typechecking rules each follow the procedure 
outlined above. The path typechecking rules match p against the 
input type a as described in step 2 above. Note that paths may bind 
variables, and the same variable may be bound to different types in 
different cases; this is why we need to include contexts F in each 
binding of the substitutions B. 

The following operation B © a; is used to typecheck x AS p; it 
adds the binding a; : r to each binding F > r in O. 
0(Bx = 
(e, Z t^ (r o r)) © a; = 6 © x, Z i-+ (F, x : r > t) 

The typechecking rule for conditional paths p[e] is slightly subtle. 
After typechecking p, we obtain a pair {a' , B) that splits a into 
an unchanged part a' and a substitution B showing where changes 
may occur. Since we do not know whether e will hold, we must ad- 
just a' by replacing each occurrence of a variable Z with Z\0{Z). 
This is accomplished using the substitution 6?: 
01 = 
e, Z i-» (F o a)? = e?,Z>-+ (PoZja) 

5.2 Metatheory 

Whenever we translate between two typed languages, we would 
like to know whether the translation is sound (i.e., type-preserving). 
This ensures that if we typecheck the expression in the source 
language then its translation will also typecheck. |j Conversely, if 



^In an implementation, one often wants to re-typecheck after translation 
anyway as a sanity check for the translator. 



r hppd {q} u{a'} 



r hpatfc {a} p -^ («', e) hi {6} lef t[insert e] {6'} 

r hj/prf {«} INSERT BEFORE p VALUE e {«'(e')} 

r l-p„ife {a} p --* (a', 6) 1-1 {e} children [lef t [insert e]] {6'} 

r h[/pd {a} INSERT AS LAST INTO p VALUE e {Q'(e'>} 

r ^Path {«} p -^ («'. e) hi {e} delete {e'} 

r \-Upd {a} DELETE p {Q'(e'>} 
Figure 7. Selected typechecking rules for simple updates 



rhp„tft {a}p^ (a', 6) 



r hpatfe {«} p -^ (g'. ei) hp,a {ei} p' -w (62. e^) 

Thpath{a}p/p''^{a'{e2),e'2) 

rhp„t,. {a}p-^(a',e) h{e}e:bool 



r ^Path {q} ■ -* {", 0) r hp„(^ {a} p[e] -^ (a'(e?), e) 

rhp,tfeMp^(a',e) rhr::0-^(r',e) 



ri-p„th{a}a;ASp~^(a',e®x) Fhp^th {n[r]} -^ (n[r'], 0) 
Figure 8. Typechecking rules for paths 



r \- r :: (t> -^ (r', 9) 



rh :: <^-» (0,0) rh a :: <^-> (a,0) 
a <: (f) Z fresh 

rh a :: <^-> (Z, Zi-» (ri>Q)) 
rhn ::<?^^ (r;,ei) T h ra :: ^ (r^, 82) 

rhri,r2::0^((r^r^),eiae2) 
rhri::<t,-^{Ti,ei) T h r2 :: <^ ^ (r^^, 82) 

rhri|r2 ::0-» (T{|r^, 01 W 82) 
rhn ::<^-> (r2,8) F h £:(X) :: (^ ^ (r',8) 
rhn* ::<^^ (t2*,8) rhX::<^^(T',8) 

Figure 9. Typechecking rules for path filters 

h {e}e :t rhe:r 
I- {0}e : r h {8, Z ^ (T > a)} e : r 

hi{e}s{8'} rhi{a}s{r} 



h {8} e : r 



hi {8} . {8'} 



h^ {0} s {0} h^ {8, Z ^{rt>a)}s{e',Z ^ {To t)} 

hst™t {0} s {©'} r hst,„t {r} s {t'} 



^St,nt {8} s {8'} 



hstmt {0} s {0} ^SUnt {e, Z ^ (r > r)} s {©', Z ^ (r r')} 



Kpam{8}p^ (8', 8" 



hpatft {0}p-* (0,0) 

^path {8} p ^ (8', 8") F hpath {a} V -> (q', 8"') 
hp„th {8, Z f-» (r > q)} p -* (8', Z f-» (r > q'), 8" tu 8'") 

Figure 10. Simultaneous typechecking judgments 



the source language expression fails to typecheck, it is preferable 
to report the error in terms of the source language using the source 
type system. We have established that the translation is indeed 
sound: 

Theorem 4 (Soundness). Assume T,t,t' have no free type vari- 
ables Z. Then ifV ha™t {r} s {r'} then T h* {r} {s\s,^, {r'}. 

Conversely, another concern is that the source-level type sys- 
tem might be too restrictive. Are there source-level expressions 
whose translations are well-formed, but which are not well-formed 
in the source-level system? This is the question of completeness, 
that is, whether the translation reflects typability. If this complete- 
ness property did not hold, this would indicate that the source type 
system could be made more expressive. Fortunately, however, com- 
pleteness does hold: 

Theorem 5 (Completeness). Assume F, r, r' have no free type 
variables Z. Then if V h* {r} [s]^^^^ {r'} then F \-stmt 
{r} . {r'}. 

6. Path-errors and dead-code analysis 

Besides developing a type system for /iXQ, iColazzo et alj ( l2006h 
studied the problem of identifying subexpressions of the query that 
always evaluate to (), but are not syntactically equal to (). Such 
"unproductive" subexpressions typically indicate errors in a query. 
For example, the query for y G x/a return a[] is well-formed 
in context F = x : 6[c[]*, d[]*], but unprodu ctive when evaluated 
against F since x/a will always be empty. Colazzo et al.' (20061) 
formally defined such path-error^ and introduced a type-based 
analysis that detects them. In this section, we define path-errors 
for updates and derive a path-error analysis for core FLUX. We 
first introduce technical machinery, then define path-errors and the 
analysis, and prove its correctness. 

Consider locations I. We will work with distinctly labeled state- 
ments si in which each core FLUX subexpression carries a distinct 
location I. We ignore locations as convenient when we wish to view 
si as an ordinary statement s. Suppose s is distinctly labeled. We 
write s[l] for the unique subexpression of e labeled by I and write 
s\i for the result of replacing the subexpression at I in s with skip. 
For example, (iter[children[s;];/];//)|;/ = iter[skip]. 

We now define a form of path-errors suitable for updates, based 
on replacing subexpressions with the trivial update skip instead of 
the empty sequence ( ) . 

Definition 1. Suppose F h" {r} s {r'}, where s is distinctly 
labeled. We say s is unproductive at I provided 7; u h s =^" 
v' ■^=> 7; ^ h s\i ^° v' for every 7 € [F], u G [r], t;' £ |r']. 
Recall that update evaluation is functional so this means that s and 
s\i are equivalent over inputs from [F], [r]. 

Moreover, we say that s has an update path-error at I provided 
s is unproductive at I and s[l] 7^ skip, and say that s is update 
path-correct if s has no update path errors. 

We define a static analysis for identifying update path-errors via 
the rules in FigurefTTI The main judgment is F h" {r} s; {r'} & 
I/, meaning s is well-formed and is unproductive at each I € L. We 
employ an auxiliary judgment F hiter {t} si {r'} & L to handle 
iteration. We also define a "conditional union" operation: 
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Note that the analysis is intraprocedural. It gives up when we 
consider a procedure call P{e): we conservatively assume that 



^Arguably, the term "path-errors" is inaccurate in that there are expressions 
such as for 5; G () return x that do not mention path expressions, yet 
contain path-en'ors. Nevertheless, we follow the existing terminology here. 
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Figure 11. Path-error analysis for updates 



there is no path error at P(e), and we do not proceed to analyze the 
body of P. We can, of course, extend the analysis to declarations 
A by analyzing each procedure body individually. 

We first establish that the analysis produces results whenever s 
is well-formed. This is straightforward by induction on derivations. 

Lemma 1. 

1. If V h" {r} s {t } then there exists L such that V h" 
{r} s {r'} & L. 

2. If r hiter {t} s {t } then there exists L such that V hiter 
{r} s {r'} & L. 

The goal of the analysis is to conservatively underestimate the 
set of possible unproductive locations in s. 

Theorem 6 (Path-Error Analysis Soundness). 

1. IfV h" {r} s {r'} & L then s is unproductive at every I £ L. 

2. If r hiter {t} s {t } &!: L then iter[s] is unproductive at 
every I £ L. 

Moreover, all of the labels in the set {I £ L \ s[l] ^ skip} 
can be reported as update path-errors and used to optimize s by 
replacing each s[l] with skip. 

Using mo re sophist icated rules for typechecking let- and f or- 
expressions, JColazzo et al. 2006) were also able to show that their 
path-error analysis is complete for /iXQ (without recursion); thus, 
path-correctness is decidable for the fragment of XQuery they stud- 
ied — a nontrivial result. Similar techniques can be used to make 
update path-error analysis more precise, but it is not obvious that 
this yields a complete analysis, even in the absence of recursion. 
We leave this issue for future work. 

It is, of course, also of interest to perform path-error analysis at 
the source level, so that the errors can be reported in terms familiar 
to the user. We believe that the path-error analysis can be "lifted" to 
the source type system, but leave this for future work. However, it 
appears that many path-errors show up in the source type system as 
empty substitutions O resulting from analyzing path expressions. 



7. Related work 

Other database update languages iLiefke and Davidsonl ( Il999l) 's 
update language CPL-I-, a typed language for updating complex- 
object databases using path-based insert, update, and delete oper- 
ations. High-level CPL-l- updates were translated to a simpler core 
language with orthogonal operations for iteration, navigation, in- 
sertion/deletion, and replacement. The FLUX core language was 
strongly influenced by CPL-l-. 

Static typing for XML processing We will focus on only the most 
closely related work on XML typechecking; M0ller and Schwartzbaca 
i2005h provide a much more complete survey of type systems for 
X ML transforma tion languages. 

iHosova et alj ([2005) introduced XDuce, the first statically 
typed XML transforrnation language based on regular expres- 
sions. [Fernandez et alj i200lh introduced many of the ideas for 
using XDuce-style regular expression subtyping for typecheck- 
ing an XML query lan guage based on m onadic comprehensions. 
XQuery's type system I JDraper et al.ll2007r ) is also based on regu- 
lar expression types and subtyping, but its rules for typechecking 
iteration are relatively imprecise: they discard information about 
the order and mult i plicity of the elements of a sequence. As dis- 
cussed by IChenevI i2008h . taking this approach to typechecking 
iterations in updates would be disastrous since many updates it- 
erate over a part of t he database while leaving its structure intact. 
IColazzo et alj i2006 f) showed how to provide more precise regular 
expression types to XQuery f or-iteratio n; we have al ready dis- 
cussed this work in the body of the paper. Cheney ( 2008) showed 
how to add subtyping and subsumption to /iXQ and FLUX while 
retaining decidable typechecking. 

XML update languages ( IChenevI 12007 1) provided a detailed dis- 
cussion of XML update language proposals and compared them 
with the Flux approach. Here, we will only discuss closely related 
or more recent work. 

ICalcagno et al.l ( |2005|) investigated DOM-style XML updates 
using context logic, a logic of "trees with holes". ICalcagno et silj 
1200a) studied a Hoare-style lo gic for sequences of atomic update 
operations on unordered XML. [Gardner et ^ | |2008|) extended this 
approach to ordered XML and while-programs over atomic DOM 
updates. This approach is very promising for reasoning about low- 



level DOM updates, for example in Java or JavaScript programs. It 
should be possible to translate core FLUX to their variant of DOM; 
it would interesting to see whether FLUX type information can also 
be compiled down to context logic in an appropriate way. 

The W3C XQuery Update Facility JChamberUn et alj2008l) has 
been under development for several years. However, the typing 
rules in the current draft treat updates as expressions of type (), 
an d to our knowledge th is type system has not been proved sound. 

iGhelli et aU ( l2007bh have developed XQueryU, a variant of 
XQuery! that is translated to an "algebraic" core language intended 
for optimization. However, the semantics of the core language is 
defined by translation back to XQueryU, which seems circular. 

Static typechecking has not been studied for any other extant 
XML update language proposals, even tho ugh the W3C's XQuery 
Upda te Facility Requirements document (iChamberlin and Robid 
120051) lists static typechecking as a strong requirement. FLUX 
shows that applying well-known functional language design prin- 
ciples leads to a language with a relatively simple semantics and 
relatively straightforward type system. 

Static analysis te chniques have b een stu died for only a few 
of thes e languages. iBenedikt et alj ( l2005al) and iBenedikt et alj 
( l2005bh studied static analysis techniques for optimizing updates 
in Updat e X, an earlier XML update language proposal due to 
ISuretal.1 ( l2004h . iGhelU et al.] ilOOld) have developed a commu- 
tativity analysis for determining when two side-effecting expres- 
sions in XQuery! can be reordered. No prior work has addressed 
path-error or dead code analysis for XML updates. 

The design goals of many of these proposals differ from those 
that motivate this work. FLUX is not meant to be a full-fledged 
programming language for mutable XML data. Instead, it is meant 
to play a role for XML and XQuery similar to that of SQL's update 
facilities relative to relational databases and SQL. Its goal is only to 
be expressive enough for typical updates to XML databases while 
remaining simple and statically typecheckable. 

Mutability in functional languages FLUX takes a "purely func- 
tional" approach to typechecking updates. The type of an update 
reflects the changes to the mutable store an update may make. 
This is similar to side-effect encapsulation using monads or ar- 
rows in Haskell. An alternative possibility might be to use ML- 
like references. This could easily handle updates to parts of an 
XML database whose type is fixed; however, handling updates that 
change the type of a part of the database would likely be problem- 
atic, due to aliasing issues. FLUX does not allow aliasing of the 
mutable store, so avoids this problem. 

8. Extensions and future work 

Additional XQuery features To simplify the discussion, we have 
omitted features such as attributes, comments, and processing in- 
structions that are present in the official XQuery data model, as 
well as the XPath axes needed to access them. We have also omit- 
ted the many additional base types and built-in functions (such 
as positionO or lastO) present in full XQuery/XPath. All of 
these features can be added without damaging the formal properties 
of the core language. 

We have also omitted the descendant axis. Many DTDs and 
XML Schemas encountered in database applications o f XML are 
nonrecursive and "shallow" (Choi 2002; iBex et alJ[2004f) . Thus, in 
practice, vertical recursion (the descendant axis //) can usually be 
avoided. Simple updates involving // can, however, be simulated 
using recursive update procedures; in fact, for non-recursive input 
types, updates involving // can often already be expressed in 
Flux. Further work is needed to understand the expressiveness and 
usability tradeoffs involved in typechecking more complex updates 
involving recursive types. 



Transformations The XQuery Update Facility (Chamberlin et alJ 
2008) includes transformations, which allow running an update 
operation within an XQuery expression, with side-effects confined 
(somewhat like runST in Haskell). Such a facility can easily be 
added using Flux updates: 

e ::— • ■ ■ | transform e by s 

with semantics and typing rules: 



7 



V 7; u h s ■ 



7 h transform e by s ^ 



F h e : n F h* {n} s {ra} 
r h transform e by s : T2 



Dynamic typechecking, incremental validation and maintenance 

We believe it is important to combine FLUX-style static typecheck- 
ing with efficient dynamic techniques in order to handl e cases 
wh ere static type information is imprecise. iBarbosa et al.l | |2004|) 
and lBalmin et al.l ( 12004 ) have studied efficient incremental valida- 
tion techniques for checking that sequences of atomic updates pre- 
serve a database's schema. These techniques impose (manageable, 
but nonzero) run-time costs per atomic update operation and stor- 
age overhead proportional to the database size; also, they require 
that the input and output types are equal, a significant limitation 
compared to FLUX. 

Efficient implementation within XML databases We have built 
a prototype FLUX interpreter in OCaml, in order to validate our 
type system and normalization translation designs and experiment 
with variations. The obvious next step is developing efficient imple- 
mentations of Flux, particularly within XML database systems. 
Liefke and Davidson ( 1999) investigated efficient implementation 
techniques for CPL-l- updates to complex-object databases, which 
have much in common with XML databases. One initial implemen- 
tation strategy could simply be to generate XQuery! or XQuery Up- 
dates from core FLUX after normalization, typechecking and high- 
level optimization; this should not be difficult since these languages 
are more expressive than FLUX. However, more sophisticated tech- 
niques may be necessary to obtain good performance. 

9. Conclusions 

The problem of updating XML databases poses many challenges 
to language design. In previous work, we introduced FLUX, a 
simple core language for XML u pdates, inspired in large part by 
the language CPL-l- introduced bv Liefke and Davi dson ( 1999) for 
updating complex object databases. In contrast to all other update 
proposals of which we are aware, FLUX preserves the good features 
of XQuery such as its purely functional semantics, while offering 
features convenient for updating XML. Moreover, FLUX is the first 
proposal for updating XML to be equipped with a sound, static type 
system. 

In this paper we have further developed the foundations of 
Flux, relaxing the limitations present in our preliminary proposal. 
First, we have extended its operational semantics and type sys- 
tem to handle recursive types and updates. This turned out to be 
straightforward. Second, although the FLUX core language is easy 
to understand, typecheck and optimize, it is not easy to use. There- 
fore, we have developed a high-level source language for updates, 
and shown how to translate it to core FLUX. Since it is difficult 
to propagate useful type error information from translated updates 
back to source updates, we have also developed a type system for 
the source language, and validated its design by proving that the 
translation both preserves and reflects typability. Third, we devel- 
oped a novel definition of update path-errors, a form of dead code 
analysis, and introduced a static analysis that identifies them. 

At present we have implemented a proof-of-concept prototype 
Flux interpreter, including typechecking for the source language 
and core language, normalization, and path-error analysis. There 



are many possible directions for future work; the most immediate 
is to develop efficient optimizing implementations of FLUX within 
existing XML databases or other XML-processing systems. 
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A. Semantics and type system for yuXQ queries 

We will use an XQuery-like core language called /iXQ, introduced 
by Colazzo et al. (2006). Following that paper, we distinguish 
between tree variables x G TVar, introduced by for, and forest 
variables, x G Var, introduced by let. The other syntactic classes 
of our variant of ^XQ include labels l,m,n £ Lab and expressions 
e G Expr; the abstract syntax of expressions is defined by the 
following BNF grammar: 

e ::— () | e, e | n[e] | w | a:: | let x — e ±n e 

true I false | if c then e else e | e ~ e 
X I i/child I e :: n I for x G e return e 

The distinguished variables S in for a; G e return e'{x) and x 
in let X = e in e"(x) are bound in e'(x) and e"(x) respectively. 
Here and elsewhere, we employ common conventions such as con- 
sidering expressions containing bound variables equivalent up to 
a-renaming and employing a richer concrete syntax including, for 
example, parentheses. 

Recursive queries can be added to /iXQ in the same manner as 
in XQuery without damaging the properties of the system needed 
in this paper. 

To simplify the presentation, we split /iXQ's projection opera- 
tion X child :: I into two expressions: child projection (x/child) 
which returns the children of x, and node name filtering (e :: n) 
which evaluates e to an arbitrary sequence and selects the nodes 
labeled n. Thus, the ordinary child axis expression x child :: n is 
syntactic sugar for (x/child) :: n and the "wildcard" child axis is 
definable as x child :: * = x/child. We also consider only one 
built-in operation, string equality. 

B. Type soundness for updates 

Type soundness for updates relies on pre-existing results for type 
soundness for queries, which we repeat here: 

Theorem 7 (Query soundness). IfFhe-.r and 7 G [F] then 
7 h e => 11 with V G [r]. 

We need the following lemmas summarizing properties of test 
subtyping and of the operational behavior of iteration. 

Lemma 2 (Subtyping and tests). Ift£ [a] then a <: cj) if and 
only ift G \4>l 

Proof. First note that if r G [a] and a <: 4> then t G \(j>\ . For the 
reverse direction, suppose r G [a] and a -/i: (j), and consider all 
combinations of cases. D 

Lemma 3 (Iteration). If ^■,v\ h iter[s] ^" v[, 7; 112 h 

iter[s] =>° v'2 are derivable then 7;«i,«2 l~ iter[s] =>° wi,i'2 
is derivable. 

Theorem 8 (Update soundness). 

1. IfV h"" {t} s {t'}, V G [r], and 7 G [FJ, then 7; w h s ^" u' 
implies v' G |t']. 

2. IfV hiter {t} s {r'}, V G [r], and 7 G [F], then 7; « h 
iter[s] =>" v' implies v' G [r'J. 

Proof. Parts (1) and (2) must be proved simultaneously by induc- 
tion; in each case the induction is on the typing derivation. The 
cases involving standard constructs (if, let, skip, sequencing) are 
omitted. For each case, we first show the typing derivation and then 
the (unique) corresponding operational derivation that can be con- 
structed, with remarks as appropriate. 
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F h : F h TO : string 


F h true : bool F h false : bool 
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F h c : bool F h ei : n F h 62 : r2 


F h if c then ei else 62 ; ti\t2 
x:n[T] GF r\- e-.T t :: n ^ t' 


ri-x/child:r F h e :: n : r' 

: Ti F h X in Ti — > e2 : r2 F h e : r t <: t' 
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Figure 13. Query well-formedness. 



• Case (insert): 

rh e :r 



7 h e => 11 



r h* {()} insert e {r} 7; h insert e =>" u 

Follows by query soundness (Theorem|2j. 
• Case (delete): 



r h" {r} delete {()} 7; t; h delete ^" () 

Immediate. 
• Case (rename): 



r h* {w'fr]} rename n {".[t]} 7; n'[v\ h rename n =>" n[v\ 
Follows since n[v\ G ["-'[t]] implies v G [r] so n[u] G 

Case (snapshot): 

7[a; := w]; w h s =>" v' 



r,2;:r |-° {r} s {r'} 



r h" {r} snapshot a:: in s {r'} 7; « h snapshot 2; in s =>' 

Follows by induction, using the fact that v £ [r] so that 

-i[x~v] e [r,x:rl. 
• Case (testl): 

Q <: </> Fh^ {Q}s{r} i £[<;!)] 7; t h s ^° « 



r 1-1 {a} </>?s {r} 7; t h (^?s ^° t) 

This case follows immediately by appealing to Lemma |2] and 
then the induction hypothesis. 
• Case (test2): 



F h^ {a} <^?s {q} 7;th(^?s^°f 

This case is immediate by Lemma|2] 
Case (children): 

F h* {r} s {r'} 



F h^ {n[r]} children[s] {n[T']] 

7; u h s =>" u' 

7;n[i'] h children[s] =^'' n[«'] 

Clearly, since n[v\ £ [w-M], we must have v £ [r]. By 
induction, we have that v' £ [r'J, from which it is immediate 
that n[u'] £ [7i[t]]. 
Case (right): 



rK{()}s{r'} 



7; h s ^" i;' 



F h" {r} right [s] {r, r } 7; n h right[s] => v,v 

By assumption, « £ |t], induction, we have that v' £ [r'J, so 
u,u' £ [r, r']. 

• Case (left): Symmetric. 

• Case (iter): 

r hiter {r} s {r'} 
r h* {r} iter[s] {r'} 

We proceed using induction hypothesis (3). 

• Case P{e): Suppose the typing derivation is of the form 

P{x : f ) : 0-1 ^ 0-2 = s £ A (j[ <: cti 
r h ei : r{ t[ <: n 



Hence the operational semantics derivation must be of the form: 

P[x : f) : (Ti ^ 0-2 = s £ A 
7 h ei => 111 



7 h e„ ^ Hn 
^[xi ■— Vi,. 



Vn]\v\- S =>" u' 



Then by query soundness and the definition of <: we have 

Vi £ [tj'J C [tj'J for each i £ {1, . . . ,n\. Hence, 7[a;i := 
Vi,...,x„ ■— u„] £ [r,a::i:ri, ... ,a::„:r„]. Moreover, i! £ 
\u'\\ <: [cTi] again by definition of subtyping, so by induction, 
we can conclude that v' £ [0-2] as desired. 
Case (iter 1): If the derivation is of the form 



Fhiter {()}s{()} 7; ^ iter[s] ^" 

then the conclusion is immediate. 

Case (iter2): If the derivation is of the form 

F 1-1 {a} s {r} 

r hiter {a} s {r} 

then by assumption, v £ \a\. Hence, u = i, () , so by induction 
hypothesis (2), we have 7; f h s =>" v' with v' £ [r] and can 
derive 



7;tl-s^°w' 7; h iter[s] ^" 
7;i, h iter[s] ^%', 
Case (iter3): If the derivation is of the form 

Fhiter {ri}.s{r{} r hiter {r2} s {r^} 
F hiter {ti,T2} S {t[,T2} 

then we must have v = vi,V2 where Vi £ \Ti\ for i £ {1, 2}; 
by induction we have 7; Wi h iter [s] =>" v[, where v'i £ [r/] 
for i £ {1, 2}. Hence, by Lemma[3]we can conclude 7; «i , U2 h 
iter[s] =>"«!, ^2 where tii,«2 £ [Ti,r2]. 
Case (iter4): If the derivation is of the form 

Fhiter {ri}g{r{} r hiter {r2} a {r^} 

r hiter {ti\t2} S {t[\t2} 

then we must have v £ [ri] U [r2|; the cases are symmetric, 
so without loss suppose v £ [n]. By induction we have that 
7; « h iter[s] =^° v' where v' £ [r/] <: [ri|r2]. 
Case (iterS): If the derivation is of the form 

F hiter {n} S {t2} 
r hiter {ri*} S {r2*} 

then since d £ t* we must have that either v = (in 
which case the conclusion is immediate) ox v — vi, . . . ,v„ 
where each Vi £ [rj. By induction, we can obtain derivations 
y,Vi h iter[s] =>" v'^ where v'^ £ [r2] for each i £ 
{1, . . . , n}; hence, by repeated application of Lemma [5] we 
can conclude that 7; 11 h iter[s] =>° v', where by definition 
v' = v[,...,v'„elr;j. 
Case (iter6): If the derivation is of the form 

F hiter {E{X)} S {t} 
r hiter {X} S {t} 

then we have that v £ [X] = [£'(^)] so the induction 
hypothesis applies directly and we can conclude that v' £ [r] . 



This exhausts all cases and completes the proof. 



D 



C. Normalizing and typechecking source updates 

Figures [141 [15] [16] and[T7]show the main typechecking judgments 
for source statements, simple updates, and paths. The statement 
typechecking rules are straightforward; note however that we re- 
quire that both statements and updates start and end with atomic 
types. The simple update typechecking rules each follow the pro- 
cedure outlined above. The path typechecking rules match p against 
the input type a. Note that paths may bind variables, and the same 
variable may be bound to different types for different cases; this is 
why we need to include contexts F in the substitutions O. In cer- 
tain rules, we choose fresh type variables. The scope with respect 
to which we require freshness is all type variables mentioned else- 
where in the surrounding derivation. 

For many of the typechecking judgments, we need to typecheck 
an expression against all of the bindings of a context-tagged substi- 
tution O. We therefore introduce several simultaneous typechecking 
judgments shown in Figure [Tsj 

C.l Metatheory 

For this source language, we first need some auxiliary lemmas to 
establish soundness. 

Lemma 4. Ifr^ {Q®x} s {9'©a::} then h^ {9} snapshot x in s 

Proof. Induction on derivation of h^ {O (B x} s {Q' © x}. D 

We also need an auxiliary notation 0|0', which merges two 
context-tagged type substitutions provided their bindings and con- 
texts match: 

0\0 = 

{e,z^^{^t>T))\{e',z^{^>T')) = (0|e'),^>-» (r,r|r') 

Lemma 5. If h {9} e : bool and h^ {9} s {9'} then h^ 
{9} if e then s else skip {9|9'}. 



F hstn^t {t} s {t'} 



{9'} 



r hstmt {t} Si {t'} r hstmt {t'} S2 {t"} 
r ^Stvit {t} Si; S2 {t"} 

The: bool F \-st,nt {a} s {r'} 

r \-stmt {r} IF e THEN s {t\t'} 

F h e : To r,x : r+o I" stmt {t} s {t'} 

F hst„,t {t} let X = e in s {t'} 

F \-upd {a} u{a'} 

r hstmt {a} u {a'} 

Figure 14. Typechecking rules for compound updates 



Proof. Induction on the structure of derivations of 
h {9} snapshot 2: in s {9 } 
followed by inversion. D 

,, Lemma 9. //h^ {9} if e then s else skip {9'} then there 



Proof. Induction on derivation of h^ {9} s {9'}. 



D 



Lemma 6. 1. If the free type variables ofr are disjoint from those 
0/9' then r(9 l+l 9') = r(9). 

2. //r h r :: </. -* (^',©) then r = t'(9>. 

3. IfV hp^th {a} p^{a', 9) then a = q'(9>. 

4. Ifhpath {9} p -^ (9', 9") then 9 = 9'(9"). 

Proof For part (1), proof is by induction on the structure of types. 
For part (2), proof is by induction on the structure of derivations, 
using part (1) for the cases involving types ri, T2 and ti\t2. Parts 
(3) and (4) follow by simultaneous induction on derivations. D 

Lemma 7. //F h r :: -> (t', 9) and h^ {9} s {&} then 

Fhiter {r}'<^?s{r'(9'>}. 



Proof. Induction on derivation of F h r ::<?!) -^ (r', 9). 



D 



Theorem 9 (Soundness). Assume F,Q,a',9i have no free type 
variables Z. 

1. IfV ^simt {r} s {r'} then F h^ {r} {s]^,^^ {r'}. 

2. //F hcprf {a} u {a'} then F h^ {a} [11]^^^ {a'}. 

3. IfV ^Path {a} p -^ (q',9) and h^ {9} s {9'} then 

rKM"}b]p„,Js)K(e'>}- 

4. Ifhpath {9i} p -^ (92,93) and h^ {93} s {9^} then 

^'{ei}bW.(«){©2(9^>}. 

Now, to prove completeness, we need lemmas establishing that 
the earlier lemmas are invertible: 

Lemma 8. If \-^ {9} snapshot a; in s {9'} then h^ {9 ffi 
x} s{9' ©x}. 



exists 9" such that 9' = 9|9" and h {9} e : bool and 
h' {9} .s {9"}. 

Proof Induction on the structure of derivations of 

h {9} if e then s else skip {9 } 
followed by inversion. D 

Lemma 10. //F huer {r} (j)?s {r'} then there exists r",9,9' 
such that r"(9') = r', F h r :; -♦ (r", 9) and h^ 
{9} s {9'}. 

Proof. Induction on the structure of derivations of 

r hiter {r} 0?s {r'} 
and then using inversion and properties of substitutions. D 

Theorem 10 (Completeness). 

1. IfV h^ {r} [s],,,„, {r'} then F ^st,nt {r} s {r'}. 

2. IfT h^ {q} [u]j,p^ {a'} then F ^upd {a} s {a'}. 

3. IfV ^^ {a} [p]pa^h{s) {a'} then there exist a", 9, 9' such 
that a' = Of" (9), F ^Path {"} P ^ (q',6), and h^ 
{9}s{9'}. 

4. Ifh^ {9} [p]path(*) {©'} '^^" '^^'''^ ex;.s?i 9i, 92, 92 such 
that 9' = 9r(9^), hpath {9} p ^ (9i,92), and h^ 

Proof. Parts (3) and (4) follow by simultaneous induction using 
previous lenmias. Parts (1) and (2) then follow by simultaneous 
induction, using parts (3) and (4). D 



r hupd {a} u {a} 



r ^Path {a} V ^ {"', e) 1-1 {0} left[insert e] {0'} 



rh 



Upd 



{a} INSERT BEFORE p VALUE e {a' (e')} 



r ^Path {a} P ^ (g', e) \-^ {6} right[insert e] {&} 

r I- Upd {a} INSERT AFTER p VALUE e {a' (0')} 

r hp^ti, {a} p ^ (a', 0) 1-1 {0} children [left [insert e]] {0'} 



n- 



Upd 



{a} INSERT AS FIRST INTO p VALUE e {a'(0')} 



r hpath {a} p ^ (a', 0) 1-1 {0} children [right [Insert e]] {0'} 



n- 



(7pd 



{a} INSERT AS LAST INTO p VALUE e {a'(0')} 



r I-P„tfe {«} P -> («', 0) 1-1 {0} delete {0'} 

r hupd {a} DELETE p {a'(0'>} 

r \-path {"} P ^ ("', 0) 1-1 {©} chlldren[delete] {0'} 

ri-Lfpd {a} DELETE FROM p{a'(0')} 

r hp^th {a} P ^ (a', e) 1-1 {0} rename n {0'} 



n- 



Lfpd 



{a} RENAME p TO n {a' {&')} 



r \-path {a} P -^ ("'. 0) 1-^ {0} delete; Insert e {0'} 



n- 



t/pd 



{a} REPLACE p WITH e {a'(0')} 



r l-paih {a} P ~^ ("'. 0) 1-^ {0} children [delete; insert e] {0'} 

r \-Upd {a} REPLACE IN p WITH e {a'(0')} 

r hp.th {a} p^{a', 0) hiw {0} s {0'} 



rh 



Upd 



{a} UPDATE p BY s {a'(0')} 



Figure 15. Typechecking rules for simple updates 



rhr ::(^-. (r',e) 









rh 


: 4>-^ (0,0) 
a 52!: 










rha 


: ^ (q, 0) 










Q <: 


Z fresh 










rh 


a :: (f) -^ 


> {Z,Z ^ (re- a)) 




rh 


n 


;:0^ 


-W,e 


1) rhra;:^ 


-(r^ 


6: 


) 




rhn 


ra :: <^ - 


-((r;,^2),ei 


iJSa) 






rh 


n 


::0^ 


-(r{,e 


1) rhra::^ 


-(r^ 


6: 


) 



rhnlra ::</)-^ (r{|r^, Gi tU 62) 

rhri::0^(r2,e) 

rhrr::0^(r2*,e) 

rhE{X)::^-^{r',e) 

ri-X::0-* (r',e) 



Figure 17. Typechecking rules for path filters 



r \-path {a} p ^ (a', 6) 



r hp.tft {q} . ^ (a, 0) 

r hpath {a} p -> (g', ei) hpath {61} p' ^ (92, e^) 
rhp„ih{a}p/p'^(Q'(e2),eJ,) 

rhpafh {q}p~^ (Q',e) h{e}e:bool 

rhp„tft{a}p[e]^(a'(e?),e) 

rhpath{a}p-^ia',e) 



r hpath {a} X as p ^ (q', e ® a;) 

rhr :: <^-. (r', 6) 

rhp„ift{n[r]}0^(n[r'],e) 



|-{e}e:r 



h {9} e : r Their 



hMe}s{e'} 



h{0}e;r h {Q, Z <^ {F t> a)} e : T 

hi{e}s{e'} Thi{a}s{r} 



h^ {0} s {0} h^ {e, Z H^ (T > a)} s {6', Z 1-^ (T t> r)} 
hst™t {0} s {0'} r hstmt {r} s {t'} 



hstmt {6} s {6'} 



hst™t {0} ^ {0} hiw {0, z ^ (r r)} s {0', z >-* (r t> r')} 



hp„th{e}p^(e',e") 



hpatft {0}p-^ (0,0) 

^Path {9} p ^ (e', e") T hp„a {q} p ^ (a', Q'") 

hpatft {e,z^{r>a)}p-^ {&, z^{r>a'), e" tu e'") 



Figure 16. Typechecking rules for paths 



Figure 18. Simultaneous typechecking judgments 



